56 research outputs found
Generalized Zurek's bound on the cost of an individual classical or quantum computation
We consider the minimal thermodynamic cost of an individual computation,
where a single input is mapped into a single output . In prior work,
Zurek proposed that this cost was given by , the conditional
Kolmogorov complexity of given (up to an additive constant which does
not depend on or ). However, this result was derived from an informal
argument, applied only to deterministic computations, and had an arbitrary
dependence on the choice of protocol (via the additive constant). Here we use
stochastic thermodynamics to derive a generalized version of Zurek's bound from
a rigorous Hamiltonian formulation. Our bound applies to all quantum and
classical processes, whether noisy or deterministic, and it explicitly captures
the dependence on the protocol. We show that is a minimal cost of
mapping to that must be paid using some combination of heat, noise, and
protocol complexity, implying a tradeoff between these three resources. Our
result is a kind of ``algorithmic fluctuation theorem'' with implications for
the relationship between the Second Law and the Physical Church-Turing thesis
Dependence of dissipation on the initial distribution over states
We analyze how the amount of work dissipated by a fixed nonequilibrium
process depends on the initial distribution over states. Specifically, we
compare the amount of dissipation when the process is used with some specified
initial distribution to the minimal amount of dissipation possible for any
initial distribution. We show that the difference between those two amounts of
dissipation is given by a simple information-theoretic function that depends
only on the initial and final state distributions. Crucially, this difference
is independent of the details of the process relating those distributions. We
then consider how dissipation depends on the initial distribution for a
'computer', i.e., a nonequilibrium process whose dynamics over coarse-grained
macrostates implement some desired input-output map. We show that our results
still apply when stated in terms of distributions over the computer's
coarse-grained macrostates. This can be viewed as a novel thermodynamic cost of
computation, reflecting changes in the distribution over inputs rather than the
logical dynamics of the computation
Estimating Mixture Entropy with Pairwise Distances
Mixture distributions arise in many parametric and non-parametric settings --
for example, in Gaussian mixture models and in non-parametric estimation. It is
often necessary to compute the entropy of a mixture, but, in most cases, this
quantity has no closed-form expression, making some form of approximation
necessary. We propose a family of estimators based on a pairwise distance
function between mixture components, and show that this estimator class has
many attractive properties. For many distributions of interest, the proposed
estimators are efficient to compute, differentiable in the mixture parameters,
and become exact when the mixture components are clustered. We prove this
family includes lower and upper bounds on the mixture entropy. The Chernoff
-divergence gives a lower bound when chosen as the distance function,
with the Bhattacharyya distance providing the tightest lower bound for
components that are symmetric and members of a location family. The
Kullback-Leibler divergence gives an upper bound when used as the distance
function. We provide closed-form expressions of these bounds for mixtures of
Gaussians, and discuss their applications to the estimation of mutual
information. We then demonstrate that our bounds are significantly tighter than
well-known existing bounds using numeric simulations. This estimator class is
very useful in optimization problems involving maximization/minimization of
entropy and mutual information, such as MaxEnt and rate distortion problems.Comment: Corrects several errata in published version, in particular in
Section V (bounds on mutual information
A novel approach to multivariate redundancy and synergy
Consider a situation in which a set of "source" random variables
have information about some "target" random variable .
For example, in neuroscience might represent the state of an external
stimulus and the activity of different brain regions.
Recent work in information theory has considered how to decompose the
information that the sources provide about the target
into separate terms such as (1) the "redundant information" that is shared
among all of sources, (2) the "unique information" that is provided only by a
single source, (3) the "synergistic information" that is provided by all
sources only when considered jointly, and (4) the "union information" that is
provided by at least one source. We propose a novel framework deriving such a
decomposition that can be applied to any number of sources. Our measures are
motivated in three distinct ways: via a formal analogy to intersection and
union operators in set theory, via a decision-theoretic operationalization
based on Blackwell's theorem, and via an axiomatic derivation. A key aspect of
our approach is that we relax the assumption that measures of redundancy and
union information should be related by the inclusion-exclusion principle. We
discuss relations to previous proposals as well as possible generalizations
Caveats for information bottleneck in deterministic scenarios
Information bottleneck (IB) is a method for extracting information from one
random variable that is relevant for predicting another random variable
. To do so, IB identifies an intermediate "bottleneck" variable that has
low mutual information and high mutual information . The "IB
curve" characterizes the set of bottleneck variables that achieve maximal
for a given , and is typically explored by maximizing the "IB
Lagrangian", . In some cases, is a deterministic
function of , including many classification problems in supervised learning
where the output class is a deterministic function of the input . We
demonstrate three caveats when using IB in any situation where is a
deterministic function of : (1) the IB curve cannot be recovered by
maximizing the IB Lagrangian for different values of ; (2) there are
"uninteresting" trivial solutions at all points of the IB curve; and (3) for
multi-layer classifiers that achieve low prediction error, different layers
cannot exhibit a strict trade-off between compression and prediction, contrary
to a recent proposal. We also show that when is a small perturbation away
from being a deterministic function of , these three caveats arise in an
approximate way. To address problem (1), we propose a functional that, unlike
the IB Lagrangian, can recover the IB curve in all cases. We demonstrate the
three caveats on the MNIST dataset
Semantic information, autonomous agency, and nonequilibrium statistical physics
Shannon information theory provides various measures of so-called "syntactic
information", which reflect the amount of statistical correlation between
systems. In contrast, the concept of "semantic information" refers to those
correlations which carry significance or "meaning" for a given system. Semantic
information plays an important role in many fields, including biology,
cognitive science, and philosophy, and there has been a long-standing interest
in formulating a broadly applicable and formal theory of semantic information.
In this paper we introduce such a theory. We define semantic information as the
syntactic information that a physical system has about its environment which is
causally necessary for the system to maintain its own existence. "Causal
necessity" is defined in terms of counter-factual interventions which scramble
correlations between the system and its environment, while "maintaining
existence" is defined in terms of the system's ability to keep itself in a low
entropy state. We also use recent results in nonequilibrium statistical physics
to analyze semantic information from a thermodynamic point of view. Our
framework is grounded in the intrinsic dynamics of a system coupled to an
environment, and is applicable to any physical system, living or otherwise. It
leads to formal definitions of several concepts that have been intuitively
understood to be related to semantic information, including "value of
information", "semantic content", and "agency"
Nonlinear Information Bottleneck
Information bottleneck (IB) is a technique for extracting information in one
random variable that is relevant for predicting another random variable
. IB works by encoding in a compressed "bottleneck" random variable
from which can be accurately decoded. However, finding the optimal
bottleneck variable involves a difficult optimization problem, which until
recently has been considered for only two limited cases: discrete and
with small state spaces, and continuous and with a Gaussian joint
distribution (in which case optimal encoding and decoding maps are linear). We
propose a method for performing IB on arbitrarily-distributed discrete and/or
continuous and , while allowing for nonlinear encoding and decoding
maps. Our approach relies on a novel non-parametric upper bound for mutual
information. We describe how to implement our method using neural networks. We
then show that it achieves better performance than the recently-proposed
"variational IB" method on several real-world datasets
Modularity and the spread of perturbations in complex dynamical systems
We propose a method to decompose dynamical systems based on the idea that
modules constrain the spread of perturbations. We find partitions of system
variables that maximize 'perturbation modularity', defined as the
autocovariance of coarse-grained perturbed trajectories. The measure
effectively separates the fast intramodular from the slow intermodular dynamics
of perturbation spreading (in this respect, it is a generalization of the
'Markov stability' method of network community detection). Our approach
captures variation of modular organization across different system states, time
scales, and in response to different kinds of perturbations: aspects of
modularity which are all relevant to real-world dynamical systems. It offers a
principled alternative to detecting communities in networks of statistical
dependencies between system variables (e.g., 'relevance networks' or
'functional networks'). Using coupled logistic maps, we demonstrate that the
method uncovers hierarchical modular organization planted in a system's
coupling matrix. Additionally, in homogeneously-coupled map lattices, it
identifies the presence of self-organized modularity that depends on the
initial state, dynamical parameters, and type of perturbations. Our approach
offers a powerful tool for exploring the modular organization of complex
dynamical systems
Thermodynamics of computing with circuits
Digital computers implement computations using circuits, as do many naturally
occurring systems (e.g., gene regulatory networks). The topology of any such
circuit restricts which variables may be physically coupled during the
operation of a circuit. We investigate how such restrictions on the physical
coupling affects the thermodynamic costs of running the circuit. To do this we
first calculate the minimal additional entropy production that arises when we
run a given gate in a circuit. We then build on this calculation, to analyze
how the thermodynamic costs of implementing a computation with a full circuit,
comprising multiple connected gates, depends on the topology of that circuit.
This analysis provides a rich new set of optimization problems that must be
addressed by any designer of a circuit, if they wish to minimize thermodynamic
costs.Comment: 26 pages (6 of appendices), 5 figure
- …